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ABSTRACT 


Indoor location is a hot research area right present, with various technologies in use. This research aims to examine the 
most important criteria and variables to consider when installing and configuring an indoor environment and provide a 
method for determining the best configuration. The RSSI signal provided by Bluetooth devices is used in this setting. 
The parameters to be taken into account while configuring an environment are recognized due to the many problems 
that this sort of technology poses, both in the emission and reception of the signal. To create this location platform, an 
Android application will be made, utilizing the platform's native Bluetooth libraries; it will function with an API 18 
(Android 4.3 Jelly Bean) at the very least, as this is when BLE4.0 support is available. Fingerprinting uses machine 
learning methods like Support Vector Machine and k-Nearest Neighbours to accomplish trilateration categorization in 
addition to the Rappaport equation-based radio propagation model. To evaluate these methodologies and discover the 
critical factors in deploying and building the environment for localization, a range of physical experimentation locations 
were used. Finally, to improve the accuracy of this strategy, a novel way for determining the ideal configuration of the 


environment is proposed. 
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INTRODUCTION 


Wi-Fi, Bluetooth, and ZigBee are currently being investigated for indoor location, as are techniques and algorithms 
such as trilateration, distance measurement (the Rapport equation), and other classification algorithms. Researchers 
are interested in beacons because they are inexpensive to install and maintain. Mobile devices' RSSI data is 
analyzed using triangulation or fingerprinting methods to develop indoor locations. Other criteria, such as crowd 
sensing, where a database is constantly updated to better profile Beacons [1], have recently been examined. A radio 
propagation model of the signal was created using Wi-Fi [2] and trilateration research with a precision of about 2 
meters. With trilateration and Bluetooth RSSI analysis [3], the signal arrival time [4] can be calculated with the 
same error as the prior method. Indoor location using fingerprinting is widely used. This technique discretizes an 
area and determines which discrete sector the signal or vector of signals corresponds to using pattern similarity. 
That is, it is about classifying the issue. These studies use Beacons or another BLE4.0 device. Due to the constant 
alteration of the RSSI signals, waiting times of up to 3 minutes are required to obtain reliable values. Mohsin, et 
al.(2019)[5] conclude that depending on the space to work, the device's configuration and the model used to 
estimate the position should be studied. Considering these developments and the new Bluetooth 4.0 protocol 
(available on mobile devices), this work analyses the main criteria that affect a correct deployment and 


configuration of an environment for the indoor location. A series of optimal algorithms based on Artificial 
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Intelligence techniques for the indoor location with Beacons as emitters are implemented. It should be noted that the 


Beacons are BLE4.0 devices with the particularity that they are powered by a battery, and their lifetime is several years. 
2.0 MATERIALS AND METHODS 


When working with wireless devices, two techniques are commonly used: (i) create a radio propagation model, that is, find 
a direct relationship between RSSI and distance, as suggested by Rappaport's equation, and improve the error using noise 
filter methods; and (ii) analyze patterns of RSSI values in discrete spaces, reducing the problem to a classification problem. 


Both techniques are described in detail in this section, and the Feature Selection criteria and metrics are used. 
2.1 Experimental Areas 


e Experimental area 1: For the experiments under fingerprinting, a portion of the first room of the ReTiCS 
laboratory was initially used, exactly a portion of 3x4 m2, as shown in Figure 1, discretized in 1 m 2 spaces. The 
edges of the area have the presence of shelves rooms with open doors and lateral windows, in addition to a 


congested space, desks and a high presence of routers on the ceilings. 


e Experimental area 2:The same Experimental Area | was used, but a larger area, as shown in Figure 4 (I) & (ID 
spaces of 1 m? were discretized in an area of 9.3x6.3, leaving 0.5 meters of space between them. In addition, a 
distance between 0.5 and 1.5 meters is left between the beginning of the discretized zones and the walls and 


Beacons used in the measurements. 


e Hall 1: Hall 3A, in front of the Communication Engineering Research lab, University of Bagdad, has the 


following characteristics: 


e One sidewall is 100% glass throughout its length. The height of the aisle is uniform. Human trafficking is very 


frequent at the time of taking measures. 
e Hall 2: The main entrance Hall 3A has these characteristics: 
e —_Laterally there is a glass staircase. On the other side wall, some offices mainly have windows. 
2.2. Mobile Devices 


Two types of mobile devices were used: a Smartphone and a Raspberry Pi 2, as receivers of the RSSI signal. The 
Smartphone with Android 5.1 was used for the first part of the work. An application was implemented that could capture 
the RSSI signals of the Beacons in the environment and basic statistics on these measures, such as the average and standard 
deviation. The Raspberry Pi 2, to which a USB Bluetooth antenna was incorporated, was used for the second part of the 


work and a script with the same objective as the Smartphone application was implemented. 
2.3. Radio Propagation Model 
To obtain an intensity-distance relationship, the following procedure was performed: 


RSSI measurements of a Beacon were taken for one minute with a mobile device and an Android application 
developed for this purpose, considering the Beacon's position as position 0 (0 meters) and moving in a straight line, leaving 
1 meter of separation between them. The data was ordered by position and the values were filtered (optional). Finally, the 


mean value of RSSI was taken for each measurement point. 
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The captured values were filtered using the technique described in [6]. The captured RSSI values are ordered and 
a percentage is designated to eliminate the extremes, keeping the intermediate values; in this way the so-called outliers are 


discarded. The equation that best fits the captured data was determined using the Rappaport equation [7] as below. 
RSSI = —(10nlog 10d + A)...... (Eq. 2.1) 


In Equation 2.1, the RSSI value depends on the value A, which is the RSSI of the source at a 1-meter distance; to 
calculate this value, prior calibration is carried out on each device. The value is the distance where we are, and the factor n 
is an environment coefficient that varies according to the location and the device to be used. This value can be easily 
calculated by solving the equation for known values at calibration. The main disadvantage of this method is the calculation 
of the environment coefficient since it will depend on the device and the environment. Calibration is necessary for each 


device and on each environment in which it works. 
2.4. Finger Printing 


Fingerprinting methods use the captures made by several wireless devices simultaneously, from different positions. In an 
offline phase, these captures are saved with their respective position in a database. Then, in an online phase, measurements 
are recaptured with a device, comparing them with the database and determining by similarity which space they belong to 
the given the characteristics of the method, if it is decided to change the distribution of the Beacons, a new offline 


measurement will be necessary for the new system. The steps used to apply this technique were the following: 


The work environment was discretized into 1m? spaces; each called a sector. All devices in the environment were 
measured for RSSI in all directions. Each Beacon's measurements were taken as a vector of measurements in post- 
processing, with each vector labeled with the corresponding sector. Measurements were taken with a mobile device, 
following the same format. The most likely sector to which the new captured measurement corresponds was determined 


using a learning algorithm that performs classification. 
The following describes how the classification algorithms used together with this method work: 
2.4.1 k-Nearest Neighbors (k-NN) 


The k-NN algorithm is a very simple classification method that compares the value to be classified with a database; its 
result is determined by the most frequent among the k nearest neighbors. Due to the simplicity of its operation, it is one of 
the most used methods for classification. The distance between two elements is generally determined by the Euclidean 
distance—although another distance criterion can be chosen depending on the approach to the problem. The Euclidean 


distance was chosen for the present work since it is better suited for these types of problems. 


The final result of the classification is determined by multiple criteria, in this case, two are used: (i) Mode 
Distance (MD), the conventional method, which uses the most frequent neighbor of the nearest 'k'; and (11) Weighted 
Distance (WD), which performs a weighting between the 'k' neighbors, so as not to eliminate information that may have 
some relevance. Using a k=5 method, that is, 5-NN, results in a vector of 5 elements. Let A = {y,y,x,x,x} ordered from 


closest to furthest. Being the positions x = (a, b) and y = (c, d). According to the criteria, the results are: 


e By mode (conventional): The class repeated the most in A is 'x'; therefore, the final result would be the position 


(a, b). By weighted distance: A penalty can be assigned; for example, for A, the final position can be calculated. 
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e  =The k-NN search function, typical of the Matlab libraries, was used. The Matlab R2014a version [8] was used for 


the calculations used in this work. 
2.4.2. Support Vector Machine (SVM) 


The SVM method learns from a series of inputs and correlates them to specific outputs. SVM is a probabilistic binary 
classifier. The model is built using an input set, and then the learning phase occurs. The model can then recognize new 
inputs and assign them the correct output. SVM uses kernel functions that maximize the margin between classes for 
nonlinear classifiers, transforming the space into a higher dimension. Polynomial kernels of degrees 2 and 3 were used 
here. 2/3 of the data was used for training and 1/3 for validation. Matlab and the libsvm library were used, which facilitates 


multi-class classification [9]. 
2.5 Relevance of Issuers 


To study the correlation and relevance of the variables in a classification problem, a technique known as Feature Selection 


is used to determine the best subset of variables to consider. 
2.5.1. ExtraTrees 


For each training sample, ExtraTrees (Extremely Randomized Trees) creates multiple models (random trees) and then 


averages them. This experiment was conducted using the Sklearn Python library and the default hyperparameters. 
2.5.2. Gradient Boosting 


Classifier: This classifier also uses decision trees as its foundation and weighted voting as a criterion for selection, and it 
also creates a previous model each time it is run [10]. Python library Sklearn with the default hyperparameters was 


employed. 


These algorithms calculate a score that represents the importance or relevance of each variable, as a percentage, in 


the classification process. 
2.6 Metrics [11] 
The metrics used to analyze the results obtained are presented. 
e Global Accuracy 
e Local Accuracy 
e Average Error (Global) 
e = Local Error 


3.0 RESULTS AND DISCUSSIONS 
3.1 Data Mining for Bluetooth-Smartphone 


The analysis carried out on the indoor location problem is presented, identifying the main variables that impact the final 
result, such as the behavior of the RSSI, the selection of the deployment area, the technique, the receiving device, and the 


TxPower. Beacons Jaalee devices were used to carry out the tests. 
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3.2About RSSI Attenuation 


To study the RSSI attenuation, tests were carried out in 2 different places of the 3A of the Communication Engineering 
Laboratory, University of Baghdad, Iraq. For this method, JAALEE Beacons and a Smartphone with Android 5.1 were 


used as a mobile device — with the Wi-Fi antenna disabled to avoid possible interference. 
3.2.1 Hall 1:1 


Beacon was placed at an initial point, and measurements were taken every 1 meter of distance. The precision is compared 
according to each intensity by means of the Mean Square Error (8.26, 13.65, and 3.67) and Standard Deviation (13.07, 
16.07, and 9.13) for each 0x06, 0x07, and OxO08TxPower in corridor 1 and Mean Square Error (12.03, 6.41, and 6.62) and 
Standard Deviation (13.34, 16.88, & 25.74) for each TxPower (0x06, 0x07, & 0x08) in corridor 2 respectively. It is pointed 
out that the range of RSSI values taken by the devices varies according to the intensity used; in some cases, as the plots 


show, the filtered value matches the raw value. 
3.2.2. Hall 2 


The same procedure was performed in the previous step. Results obtained from raw form and after a noise filter, including 
the theoretical Rappaport curve in each case. These results show that the range and amplitude of values are irregular and 
there is no sense in looking for a direct RSSI-distance relationship. The graphs show the same RSSI value for different 


Beacon distances, so working with a trilateration method is not a viable solution as it would not provide reliable accuracy. 


However, similar behaviors can be determined in both corridors; for example, at distances of 1 to 8 meters, the 
experimental points fit better, values beyond these distances no longer reflect a particular behavior, except for TxPower = 
0x08 in which the signal decay is not noticeable. Despite this, the lower intensities better fit the theoretical values.Mean 
Square Error (2.25, 9.45, and 3.66) and Standard Deviation (SD)(6.81, 13.38, & 9.11) for each TxPower ((0x06, 0x07, & 
0x08) in corridor 1, up to 8m. When comparing these results with corridor 1 results, an improvement is observed 
concerning the mean error and range of standard deviations, except TxPower= 0x08. From this comparison, it is concluded 
that limiting the size of the experimental area can play an important role when applying another technique. From this 
analysis, the distance between emitter and receiver plays an important role in the estimation of RSSI; different types of 
configurations were considered at the time of starting the deployment of the experimental areas. To perform this analysis, 


the fingerprinting technique was applied. 


The Concurrent and Real-Time Systems (ReTiCS) laboratory was used for these tests, where two variables were 
analyzed: the discretization of space and the distribution of the Beacons. The final results determined the relevance of these 


variables and the TxPowerbase on which to work. 


The first configuration was worked on 12m/, discretizing a 4x3 meter rectangle in 1m? sectors; they were tested 


with distributions of 4 and 5 Beacons placed around this rectangle. 


The following configuration was in the same laboratory on an area of 9.6 x 6.3 meters, discretizing sectors of 1m7?, this 


time separated by distances of 0.5 meters and forming a rectangle of 5x3 discrete spaces throughout the described area. 
3.2.3 Experimental Area 1 


Over an area of 12m’, discretized into contiguous sectors of 1m, distributed in a 4x3 matrix. The following criteria were 


considered at the time of the experience: 
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e In each sector, measurements were taken on-site, static and walking in all directions. Samples were taken for 5 to 
6 minutes in each sector. To determine a vector of measurements, the mobile application waits to fetch RSSI 


values from all Beacons at least once, if they have been calculated. 


sca — 


Figure 1: Beacons Distribution for Experimental Area 1. 


e More than one takes the mean, leaving a 4- or 5-dimensional vector as each element. 
e No noise filter is performed for this type of measurement. 


The analyzed space is shown in Figure 1, where we work withS Beacons initially, labeled as BeO7, BeO08, Be09, 
Bel0 and Bel1. All Beacons work with TxPower = 0x07, since the space to be analyzed is small and an obvious reduction 
in RSSI is sought to identify sectors; As explained above, for distances less than 8 meters, fewer errors are obtained in the 


estimation of the position. 


To determine blind spots in the configuration or a smaller number of Beacons without compromising 


performance, 5 extra tests were done in this area, each removing one Beacon from the distribution. 


The following tests were carried out using data of 231 training elements and 99 randomly chosen validation 


elements. The results shown correspond to the average value of the algorithm executed 50 times. 
Case 1: k-NN 


The k-NN algorithm was used, varying the hyperparameter k with average values to identify the one that gives better 
results. Two approaches were performed: using the mode (MD) [k = 1: (1.51m) 3:(1.72m), 5:(1.68m)] and the weighted 
distance (WD)[k = 1: (1.51m) 3:(1.34m), 5:(1.29m)]; considering the position of each sector (x, y) the midpoint of each 


square meter. 


For the level of error for each configuration (Be07, Be08, Be09, BelO, Bell) using the two approaches, 
highlighting the best result and configuration in bold. In addition, figures 2 (I) and (II) indicate the error in the sectors, with 


heat maps, of the best results. 


Impact Factor (JCC): 11.1093 NAAS Rating: 3.76 


Techniques for Indoor Location-Based on Bluetooth Fingerprinting using Artificial Intelligence Algorithms 67 


y az 4. 225 3 22s 4. 
200 3 2.00 ar 2.00 2.00 
175 1.75 175 4.78 
1.50 4.50 1.50 450 
; 125 ; 1.25 : 128 : 428 
4.00 4.00 : 120 4.00 
: ors 07s O75 o7s 
oso oso i oso” oso 
oz 0 ozs - ozs ozs 
0.0 e020 0.0 0.00 0 e.00 0. 0.00 
0005 10 15 2.0 2.5 3.0 0.0 0.5 1.015 2.0 2.5 3.0 005 10 15 2.0 2.5 3.0 0. 


7. ~k=5-wi J MD - k=1- wi . (A) Kemel Polinomial g=2 (B) Kemel Polinomial g=2 
(A) WD - k=5 - without Be09. (B) MD - k= 1 - without Bell. hana Dolor ) ae PT 


® it) 


Figure 2 (I): According to WD and MD, Positioning Error (m) using k-NN. Heat Maps for TxPower = 0x07 & 
(ID) Positioning Error (m) using SVM. Heat Maps for TxPower = 0x07. 


it) 
0 


From the results, it is observed that for k = 1, the same results are obtained in both criteria, for values k = 3 and k 
= 5, the WD criterion has the best results for all the beacon configurations. With k = 5 and the WD criterion is improved by 
0.4 meters. On the other hand, the configuration that gives the least error is Beacon Be09 omits, 1.27m. It is inferred that 
with a distribution of 1 Beacon per corner in an area of 4x3 meters there is an acceptable precision; also, according to the 
heat maps, the areas near the Beacons give the worst results. It must be taken into account that the work was done in a 


small area, so although the error is small, it corresponds to approximately 50% of the total area. 
Case 2: SVM 
3 cases, in particular, were analyzed: for a linear kernel, a polynomial of degree 2, and polynomial of degree 3. 


Results of Mean error using SVM for different kernel grades in experimental area | with configurations (Be07, 
Be08, BeO09, Be10, Bell), for six types of configurations, Linear average values (1.64m), Polynomial G=2 (1.62m) and 
Polynomial G=3 (1.63m)shows the average errors for all beacon configurations according to the kernel used, the best 


results are indicated in bold. Figures 2(1) and (II) show heat maps indicating the error in each sector from the best results. 


From the results obtained in the above case, it can be seen that the mean error using SVM is slightly higher than 
using k-NN with WD criteria. For the k-NN case, see Figure 3(1), the heat maps show that the positioning error is 
randomly distributed and the central zone has the lowest map error. For SVM, the error distribution is uniform but of 
greater magnitude; the execution times fork-NN, SVM-Linear & SVM-Polynomial technique are0.219, 36.97, and 


32.94milliseconds (ms). These results show a larger area in which border zones between the sectors are added. 
3.2.4 Experimental area 2 


A space with a larger area, 9.6x6.3 meters, was analyzed, leaving 0.5 meters of distance between each sector, of 1m2, and 
with a margin between the Beacons and the points to take measurements of approximately between 0.7 and 1.7 meters. 
Figure 3([I)shows a diagram of the space and distribution of Beacons used. This space is also located in the ReTiCS 
laboratory. The intense presence of routers and other Bluetooth devices —both Beacons and microcontrollers— must be 
considered throughout the environment, which generates more noise in the measurements obtained for the experience. k- 
NN was used since it gave better results than SVM in experimental area 1; In this case, the aim is to study the behavior of 
the base TxPower of the Beacons. The initial analysis of Rappaport's equation is also used; the higher TxPower there is 


better the adjustment of the signal for distances of up to 8 meters to reduce the average error. 


www.Uprc.org editor @tjpre.org 


68 Khalid Twarishalhamazani & Jalawialshudukhi 


3.2.4.1 TxPower Analysis on Location 


2 types of JAALEE currents are used: TxPower = 0x04 and TxPower= 0x07. Regarding data collection, the following 


criteria can be mentioned: 


e Measurements were taken for 2 or 3 minutes in each sector, slowly moving in various directions not to alter the 
signal's stability.No type of noise filter was performed for the captures. In this case, NA values were considered. 
As the space is greater, the Beacons do not cover 100%. This aspect will be used in the analysis as a discriminant 


to determine to which sector each measure belongs more easily. 


e For each second that passes, the captured values are collected, and the average is calculated; if any Beacon did not 


capture any value, the lowest value (-120) is assigned as a constant value to the data vector. 


Cumulative Probability 
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Figure 3 (I): Accumulated Error for 5-NN using WD, Excluding Beacon Be09& 
(ID) Beacons Distribution for the Experimental Area. 


The size of the training and validation samples used in each TxPower (0x07 & 0x04), approximately 60 data 
vectors per sector in the zone are RSSI at 1m (dBm) (-74 &-56); # training data (756 & 609) and # Data Validation (291 & 
260). For each TxPower in results, the best results are obtained with configurations of 5 beacons. It is observed that with 


the highest TxPower (0x04) and under a WD criterion, the mean error is reduced by approximately 17cm. 


k-NN - mean error (average values) for k=1 (2.31m),k=2 (2.29m) and fro K=5 (2.35m) using MD and the mean 
error for k=1 (2.38m),k=2 (2.43m) and fro K=5 (2.37m)using WD, for different beacon configurations 
(Be07,Be08,Be09,Be10,Bel1 in 6 sets each) with TxPower = 0x04 and TxPower = 0x07. 


Figures 4(1) & (II) show the errors for the sectors by heat maps. In this case, a behavior similar to the previous 
area is observed, where the balanced mean errors improve the global error. Furthermore, the sectors near the corners have a 


greater error than the central zones, the RSSI decreases and are better differentiated, improving the classification. 


With these results, it has been identified that the discretization of the experimental area and distribution of the 
Beacons has a direct impact on the final results, with the best configuration to date being the use of 5 Beacons with 


TxPower = 0x04 and k—NN (k = 5) with a criterion WD. 
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Figure4: (I): Positioning Error (m) using k-NN. Heat Maps for TxPower = 0x04. & (II) Positioning Error 
(m) using k-NN. Heat Maps for TxPower = 0x07. 


3.3 Comparison between Experimental Areas 


Experimental areas 1 and 2 have an area of 12m? and 60.48m/”, respectively. To directly compare the results obtained 


between them, a mean error coefficient [12] 
Total Area 


The coefficients for the best results according to the areal under k-NN(MD)Algorithm is [07,08,10,11-k=1]-18.75& and 
for area 2[07,08,09,10,11-k=5]-7.49. and in the same way fork-NN(WD)[07,08,09, 10-k=5]-13.34 & [07,08,09, 10, 11-k=5]- 
5.52 for Area-1 and Area-1 respectively. The best is obtained with experimental area 2, using 5-NN with the WD criterion 
and a TxPower = 0x04 with the presence of all the Beacons. From the same results, it is observed that in both cases, 
experimental area 2 has an improvement concerning experimental area 1; this defines the following criteria when defining 
an area. A large area with borders between the sectors generates a drop in the RSSI between them, providing a better 
differentiation of the values and increasing classification accuracy. Beacons positioned in the corners significantly improve 
the balance of average errors within the area. Beacon Be09 in the middle of one side of the area, helps differentiate RSSI 


values between sectors, improving classification. By using k-NN, the precision is improved by using a WD criterion. 
4.0 CONCLUSIONS 


To summarize the elements analyzed so far, the following can be indicated: 


By studying the behavior of RSSI, the use of methods involving the RSSI-distance relationship is ruled out, due to 
its high susceptibility. The best discretization and deployment of an experimental area was determined, concluding that to 
work with a fingerprinting method, it is advisable to apply border regions, since they help define a drop in RSSI between 


sectors. The best algorithm is k-NN, with k = 5 and a configuration of the 5 Beacons present in the experimental area. 
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